plugin: minor improvements to logic, variable initialization#817
Conversation
Problem: In check_and_release_held_jobs (), held_job_id is not set in the case where removing the max_sched_jobs dependency fails, which is inconsistent with the other checks throughout this function. Add the initialization of held_job_id in the if-branch check for removing the max_sched_jobs dependency from a held job.
Problem: The "goto error" block is unconditionally executed at the end of check_and_release_held_jobs () because there is no "return" statement before the label, even on success. This incorrectly populates the journal with error messages, even if the function succeeds. Add a "return 0" call before the label.
Problem: In run_cb (), the held_jobs attribute for an Association object is not checked to see if it is empty before calling check_and_release_held_jobs (). Check to see that it has at least one entry in it before calling check_and_release_held_jobs ().
| return 0; | ||
| error: | ||
| flux_jobtap_raise_exception (p, | ||
| held_job_id, | ||
| "mf_priority", | ||
| 0, | ||
| "check_and_release_held_jobs: failed to " | ||
| "remove %s dependency from job %ju", | ||
| dependency.c_str (), | ||
| held_job_id); | ||
| return -1; |
There was a problem hiding this comment.
I'm surprised this issue wasn't noticed before. Without the return 0 added here, wouldn't every job have this exception raised?
There was a problem hiding this comment.
Yes, exactly 🤦 this floods the journal with these exception messages. I think I've mentioned it in a meeting or two a while ago that inspecting the log showed so many of these messages, even though nothing was wrong. Now I know why 😅
There was a problem hiding this comment.
Wouldn't there be job failures though in addition to just the log messages?
There was a problem hiding this comment.
Oh, I see what you're saying. Well, I actually think in the case that removing any and all dependencies succeeds and everything proceeds normally, held_job_id (the variable passed in to flux_jobtap_raise_exception) remains 0, so the job itself would not fail, but I'm pretty sure any callback that called this function would get a return value of -1. Does that make sense?
|
Thanks for the review @jameshcorbett! Setting MWP here |
Merge Queue StatusRule:
This pull request spent 5 seconds in the queue, with no time running CI. Required conditions to merge
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #817 +/- ##
==========================================
- Coverage 82.93% 82.62% -0.32%
==========================================
Files 27 27
Lines 2479 2480 +1
==========================================
- Hits 2056 2049 -7
- Misses 423 431 +8
🚀 New features to boost your workflow:
|
Problem
There are few minor issues in the priority plugin:
check_and_release_held_jobs (),held_job_idis not set in the case where removing themax_sched_jobsdependency fails, which is inconsistent with the other checks throughout this function.goto errorblock is unconditionally executed at the end ofcheck_and_release_held_jobs ()because there is noreturnstatement before the label, even on success. This incorrectly populates the journal with error messages, even if the function succeeds.run_cb (), theheld_jobsattribute for an Association object is not checked to see if it is empty before callingcheck_and_release_held_jobs ().This PR includes minor fixes for all three problems listed above.